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Abstract 

We use the torsional angles of the protein chain as generalized coor- 
dinates in the canonical formalism, derive canonical equations of motion, 
and investigate the coordinate dependence of the kinetic energy expressed 
in terms of the canonical momenta. We use the formalism to compute 
the normal-frequency distributions of the a-hclix and the /3-sheet, under 
the assumption that they are stabilized purely through hydrogen bonding. 
Comparison of their free energies show the existence of a phase transition 
between the a-helix and the /3-sheet at a critical temperature. 



1 Introduction and results 

The purpose of this work is to describe the backbone chain of a protein molecule 
in terms of dynamically independent variables, which are the torsional angles 
between successive units of the chain. These angles determined the average 
conformation of the chain, and local vibrations of chemical bonds only contribute 
to small fluctuations about the average. By ignoring these fluctuations, we gain 
a better overview of the motion of the chain, in particular its folding. 

We use the torsional angles as generalized coordinates in the canonical for- 
malism, with concomitant canonical momenta. The kinetic energy then becomes 
a function of the coordinates, when expressed in term of the canonical momenta. 
That is, masses arc replaced by a generalized mass matrix, which is a function 
of the coordinates. There arises an effective potential, which is discussed in de- 
tailed later. By studying this mass matrix numerically, we find that the effective 
potential is approximately constant for almost all conformations of the chain. 



This result is significant for practical applications, particularly for the CSAW 
(conditioned self-avoiding walk) model fV, '2] , where such an approach was first 
used. 

Using the canonical formalism, we formulate the eigenvalue problem that 
describes the normal modes of the system with respect to an equilibrium con- 
formation. Actual computations are carried out for a pure a-helix and a pure 
/3-sheet, to obtain distribution functions of the normal frequencies. These pure 
structures are hypothetical, of course, for in a real situation they are embed- 
ded inside a larger protein. However, we can learn something useful from these 
examples. 

First of all, in our model we assume that the a-helix and the /3-sheet are 
stabilized purely through hydrogen bonding. The positive-definiteness of normal 
frequencies indicates that these structures can maintain mechanical stability 
from hydrogen bonding. This leads one to expect that in the unfolded protein 
chain, which is subject to random forces from the solution, these secondary 
structures may still have transient existence. 

The normal- frequency distribution function for the a-helix exhibits a number 
of peaks. By examining the corresponding eigenvectors, we can associate them 
with types of distortion, namely stretching, twisting and bending. By super- 
posing the distributions of several a-helices, we can construct an approximate 
normal-frequency distribution of an all-a protein, such as myoglobin. 

We can obtain the free energy of a structure near equilibrium by treating 
the system as a collection of harmonic oscillators with the calculated normal 
frequencies. As such an exercise, we compute the free energy of a pure a-helix 
and that of a pure /9-sheet, and plot the results as functions of temperature. 
We find that the two curves intersect, indicating a phase transition occurring 
at that temperature. Such a model is of course too crude to have quantitative 
significance, for real secondary structures are embedded in a larger protein, and 
interactions not taken into account here may be important. However, in view of 
the importance of the subject, in particular its possible relevance to the prion 
transition ^ , any exploratory calculation in this direction might not be totally 
meaningless. 

2 Modelling the protein chain 

The protein chain consists of a sequence of amino acids chosen from a pool 
of 20. These amino acids all center about a carbon atom called the Cq, and 
differ from one another only in the side chains connected to the Cq.. When the 
amino acids are joined into a chain, they become interlocked "residues" . From a 
dynamical point of view, the independent units of the chain are "cranks" made 
up of coplanar chemical bonds, which connect one to the next, as shown in 
Fig. [T| The bond lengths and bond angles in a crank are given in Table [l]|l]. 
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Figure 1 : Upper panel shows the "crank" that connects the center of one residue 
to the next. The vectors a, b, c, d, e represent chemical bonds, which all lie in 
the same plane. (Side chains have been omitted for clarity). Lower panel shows 
how the cranks are connected to form the backbone of the protein. The vector 
position of a is denoted by Rcqi- The angle between cranks being fixed, 
the relative orientation of successive cranks is specified by two torsional angles 
4) and if). The conformation of the backbone chain is completely specified by a 
set of torsional angles. Data for bond lengths and angles are given in Table [T| 



Table 1: Bond lengths and bond angles. Data obtained from Protein Data Bank 
web site (|http: / / ww w.pdb.orgj ) [4] . 





Bond Length (A) 




Bond Angle (°) 


Cq.-C 


1.525 


ZC„CN 


116.2 


C-N 


1.329 


ZCNCa 


121.7 




1.458 


ZNCaC 


109.5 


C-0 


1.231 


zc„co 


120.8 


N-H 


1.000 


ZC„NH 


114.0 
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The backbone of the protein chain is thus a sequence of cranks. The angle 
between two adjacent cranks is fixed at the tetrahedral angle cos~^(— 1/3) ~ 
109.5°. Thus, the orientation of one crank with respect to its predecessor is 
specified by two torsional angles {(/), as illustrated in Fig. [l] 

The conformation of the backbone of the protein is completely specified by a 
set of torsional angles {0i, V'l; 02, ^^2; ■ ■ • }■ In this study, we only consider these 
torsional degrees of freedom, ignoring the small high-frequency vibrations within 
the cranks. Such a description has been used in the CSAW model (conditioned 
self-avoiding walk) of protein folding. [H [2] 

3 Canonical formalism 

Consider a chain of n cranks. Let 

Rsi — position vector of atom s on crank i 
i — 1, . . . , n 

s = C„,C, N, O, H (1) 

We also write this as R;, an array of the position vectors of elements labelled 
by s. 

In this study we do not take the side chains into account, so the closest real 
protein to our model is polyglycine. Formally, it is straightforward to generalize 
the model. 

Let the set of torsional angles be {(f>i,ipi}{i — 1, . . . , n — 1). Assuming the 
cranks to be perfectly rigid places constraints on the vector positions. The 
constraints are solved by using the torsional angles as generalized coordinates, 
which we denote by the notation 

q,k = 1; fc= 1,2). (2) 

Thus, for example, 

911 = 01 912 = Ipl 

921 = 02 922 = -02 etc. (3) 

We also use the notation where a ~ {i, k}. We are to regard Hi as functions 
of {qa}- 

The velocity is given by 

r^^-EEa^'^.. = Ea^9.. (4) 

i=l fc=l a ^" 

The total kinetic energy is 

1 ""^ 1 /""^ 9R dTl\ 1 
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where the mass matrix is given by: 

Map = > ^ m- — . 

1~{ "^P 

This is a symmetric matrix, with i\f = M. We use the shorthand 

s 

where nis denotes the mass of atom s on the crank. 
The Lagrangian of the backbone chain is given by 

L{q,q) = K{q,q)~U{q) 
= ]^fMq-U{q). 

Hence, the canonical momentum is 

p= — = Mq. 
oq 



and the generaUzed force is 



dL _l .j,dM . dU 
dq 2^ dq^ dq' 



The Lagrange equation of motion 

fdL\ _ dL 
dt \ dq J dq 

leads to 

I .rdM . dU 



The Hamiltonian is 

H{v.q) = K{p,q) + U{q) 
= ^fMq + U{q). 

From ^ we have q — M^^p and q^ ~ p^ [M^^)'^ . Therefore, 

H{p,q) = ]^p^M-^p+U{q). 



The canonical equations of motion 



take the forms 



dH . dH 



1 j,dM-^ dU 



Nr^p. (16) 



These are, of course, the same as the Lagrangian equation of motion (12 1. 

4 The effective potential 

The partition function of the system is, up to a constant scale factor, given by 
Z= /dq /dpe-^^^^-^'^e-™, (17) 



where /3 = {kBT)~^ is the inverse temperature. The p-integration is Gaussian, 
and can be immediately carried out, and the result generally depends on q. This 
gives rise to an effective potential Vcf^{q)^ which is defined through the relation 

e-/5Vo«(9) ^ j^Ay ' j dp e-'5^(P.«). (18) 

Thus 

Z = J dq pconiq), 

Peo„(g) ^ e-''(^+^-), (19) 

where Pcon is the configurational probability density. That is, Pcondq is the 
relative probability of finding the system in dq, regardless of momentum p. If 
the kinetic energy is independent of q, the effective potential is a constant. 

In a canonical ensemble, the relative probability of finding the state in ele- 
ment dpdq in phase space is given by dpdqexjp {—fiH). If we are only interested 
in the probability of finding the state in dq, we integrate the above over p, and 
obtain 

dq j dpexp{—PH) = dgexp(— /?[/) J dpcxp{—[3K) 

zvr 



dq\j] exp(-/3(C/ + Kff)). (20) 
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Figure 2: Effective potential of a 3-crank chain at different {(p, tp} angles between 
the second and third crank. The percentage change of the effective potential is 
less than 0.2%. 

This is the probability to be used, for example, in the Monte-Carlo algorithm 
in the CSAW model [IlE]. 

We now perform the momentum integration: 

I dp e-'^^^P'^) ^|dp^■■■ dp,^^.,^ exp (^-^p^Af-ip) = [jj ' ^d^. 

(21) 

Thus 

/3Kff (g) = - ^ In (det M) = - ^Tr (In M) . (22) 

Since Ves{q) depends on all the torsional angles, it is a function of the chain 
conformation. Our calculations show that it is sensibly constant for almost all 
different conformations. Representative results are shown in Fig. [2] 

5 Potential energy of hydrogen bonding 

As an application of the canonical formalism, we shall calculate the normal 
modes of the a-helix and the /3-sheet, which are important secondary structures 
in the folded state of a protein. The main stabilizing agents for these structures 
are hydrogen bonds, which exist between N-H and C=0 groups from different 
residues [5] [B]. We assume that a hydrogen bond is formed when the distance 



7 



between the H and O atoms is 2.0 ± 1.0 A, and the bond angle between N-H 
and C=0 is 180±45°[S]. 

The a-helix, also known as the 4i3-helix, is the most abundant secondary 
structure due to its tight conformation In this configuration, a hydrogen 
bond connects the C=0 group of ith crank to the N-H group of (i + 3)th crank. 

The /3-sheet is a two-dimensional mat made up of backbone strands stitched 
together by hydrogen bonds [Bj.The participating strands may be parallel or 
antiparallel. 

We wish to study the normal modes of small vibrations about an equilibrium 
configuration. The potential energy U is assumed to be minimum, and taken 
to be zero, at this configuration. The equilibrium is assumed to be maintained 
by hydrogen bonds. Deviations from equilibrium arise from the stretching and 
bending of these bonds. Let be the bond vector of the ith hydrogen bond, 
i.e. the vector between the O and the bonded H, in the equilibrium situation. 
Let b^ be the same vector when the configuration is displaced from equilibrium. 
The displacement vector is given by 



Ui = b ■ - b, . 

For small displacements, we take the potential energy to be 



X Ui 



(23) 



(24) 



where b^ = bi/|bi|, and ki and K2 are the force constants associated with the 
stretching and bending of hydrogen bonds, respectively [7] 



= 13 N/m , 
K2 = 3 N/m . 

Let the generalized coordinates be denoted 

q = qQ + X 



(25) 



(26) 



where go corresponds to equilibrium, and A represents a small deviation. We 
can write 

K + Oi\^) (27) 



dqa 



where the subscript indicates evaluation at equilibrium. This leads to the 
quadratic form 



U 

Dap 



1 



\^{kiD + K2C)X , 



Ca 







E 

i 

E 



b,- 



b,; X 









dqa 





9qp 



dqa 



hi X 



9b[ 

dqp 



(28) 
(29) 

(30) 
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6 Normal Modes 



For small oscillations about equilibrium, the linearized equation of motion is 



From (28) we have 

dU 

— ^ {k^D + K2C)\ . (32) 
oq 

Thus 

M\+{kiD + K2C)\ = Q. (33) 

The normal frequencies lj and normal modes A are eigenvalues and eigenvectors 
of the equation 

Kr^{KiD + K2C)X = u?\ . (34) 

Our model's validity is subject to the following conditions: 

1. We treat small oscillation about a presumed equilibrium configuration 
go- Whether indeed corresponds to equilibrium can be verified through the 
requirement that all normal frequencies be nonzero and positive. 

2. We ignore electrostatic and other interactions. Our results can serve 
as a test whether the structures investigated can maintain equilibrium purely 
through hydrogen bonding. Inclusion of other interactions will introduce cor- 
rections. 

3. Actual a and (3 structures are embedded inside a protein molecule in 
solution, and are subject to other forces not considered here, particularly those 
arising from Brownian motion in the solution, the hydrophobic effect, and inter- 
action with other atoms in the protein. These forces will give rise to corrections, 
and may even destroy the stability of the structure. 

In view of the limitations of the model, we only examine normal modes in 
a frequency range corresponding to wave numbers 10~^ — 10^ cm~^. This is 
because, in a real protein, the very low-frequency end will be dominated by 
binding effects to the rest of the protein, while the very high-frequency region 
will be dominated by bond oscillations. 



7 The a-helix 

We have modelled a generic a-helix using torsional angles 0, ^ from polyalanine 

m 

{</),V} = {-57.4°, -47.5°}. (35) 

The equivalent spring system is illustrated in Fig. [3] In this example, there are 
7 cranks, but only 4 hydrogen bonds. In general, for n cranks, the number of 
hydrogen bonds is n — 3. The number of degrees of freedom from stretching and 
bending of the hydrogen bonds is thus 2{n — 3). The total number of degrees 
of freedom of the system, however, is 2 (n — 1) . Thus we expect to have 4 zero 
modes, apart from rigid translations and rotations. These will not be included 
in our results. 
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Figure 3: The mechanical system corresponding to small oscillations of the a- 
helix (solid lines). Springs are hydrogen bonds (dashed lines). 



Table 2: Normal modes of a-helix. 



Frequency (cm ^) 


Mode 


0-10 
30-40 
90 - 100 
120 - 130 


twisting 
stretching & bending 
bending 
bending 



Fig. [4]shows the distributions of normal modes as function of wave number, 
for different crank numbers n. All calculated frequencies are positive. The 
upper panel shows distributions for n = 10 — 50, while the lower panel shows 
those for n = 60 — 100. In the latter case, the distributions fit a scaling law 

Density of modes oc n^'^. (36) 

The plotted distributions have been divided by this factor. 

The distributions exhibit 4 peaks associated with various types of deforma- 
tion, which can be ascertained by examining the corresponding eigenvectors. 
The results are listed in Table [21 

As an application of our results, we calculate the normal-mode distribution 
for myoglobin (IMBD) [8 , which is made up of 8 alpha-helices, by superposing 
our calculated distributions. This procedure ignores the interactions between 
helices, and contributions from the loops connecting the helices, and can only 
give the crudest approximation to the actual distribution. The result is shown 
in the upper panel of Fig. [s] The lower panel shows a histogram obtained previ- 
ously by Krimm and Reisdorf [9], using a different method. There is qualitative 
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14 




Wavenumber (cm'^) 



0.07 




Wavenumber (cm'^) 



Figure 4: Normal-frequency distributions for the a-helix, with different numbers 
of cranks n. Upper panel displays cases n = 10 — 50. Lower panel displays cases 
n = 60 — 100, and the distributions are divided by a scaling factor n^-^. Types 
of distortion corresponding to the peaks are listed in Table [2j 
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agreement, but the peaks are shifted, presumably due to interactions neglected 
in our simple superposition. The rough shape of the distribution bears resem- 
blance to that calculated for BPTI, a globular protein with 58 residues [10]. 



8 The /?-sheet 

We model a generic /3-sheet by setting the torsional angles in each strand to 

= {-139°, 135°} (antiparallel case [n]), 

= {-119°, 114°} (parallel case [12]). (37) 

The connectivity of hydrogen bonds for the parallel and antiparallel cases is 
shown in Fig. [6] In the antiparallel case, an extra crank is included to join two 
adjacent strands. In the parallel case, the strands are left open-ended. 

Compared to the a-helix, the /3-sheet has fewer hydrogen bonds formed 
within the structure. Thus we expect that in our model there will be more 
zero modes compared to the a-helix; but we ignore them for reasons stated 
previously. Otherwise, all calculated frequencies are positive. 

Normal frequencies are computed for varying numbers of strands, and cranks 
per strand. We display representative distributions in Fig. [7] for antiparallel 
and parallel sheets. We see that the frequencies are concentrated around 50 
CTO~^. This is consistent with calculations on real protein with /3-sheet structure 
[TTJ [T31 [2] . In general, the peak positions of the distributions depend only on 
the number of cranks per strand, and are independent of the number of strands. 
The peaks tend to widen with increasing crank number. 



9 a-f5 transition 

The transition between a-helix and /3-sheet is an important subject, in view of its 
possible relevance to the prion transition ,3^ , and the existence of proteins with 
ambivalent structures [15 . From our results, we can make a crude calculation, 
which should be taken to be of intuitive, rather than practical value. 

We can obtain the free energy of a structure in the neighborhood of equilib- 
rium, by adding the contributions from all of its normal modes, each treated as 
a harmonic oscillator. For one classical harmonic oscillator of natural frequency 
Lo, at absolute temperature T, the Helmholtz free energy is 

A, = -kBTh-i(^). (38) 
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Waven umber (cm'^) 

Figure 5: Upper panel: Normal- frequency distribution for myoglobin, con- 
structed by superimposing those of the 8 a-helices in the protein. Lower panel: 
Result of an independent calculation by Krimm and Reisdorf [9J . There is qual- 
itative agreement, but the peaks are shifted, possibly due to our neglect of 
interactions between helices, and contributions from loops. 
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Figure 6: Schematic diagram of /3-sheets, illustrating the connectivity of hy- 
drogen bonds. Upper diagram: parallel /3-sheet. Lower diagram: antiparallel 
/3-sheet. 



For N oscillators, corresponding to N normal modes of frequencies wi, . . . , wjv, 
the total free energy is 



An = -NksTln 



Co = {lji ■ ■ ■ ojnY^'^ ■ (39) 

Fig. [8] shows the free energies as a function of temperature, for an a-helix 
and an antiparallel /3-sheet, each having 39 cranks. The /3-sheet is made up of 
5 strands with 7 cranks per strand. 

The expressions for the free energy are valid only when the system is har- 
monic. As models for secondary structures in a real protein, our results are 
expected to be valid only in a certain neighborhood of kBT/hCj = 1. How large 
the neighborhood is depends on interactions between the secondary structure 
and its environment. Taking these curves on face value, we see from Fig. [8]that 
they intersect at = 20 K, above which the a-helix has a lower free energy. 
In this hypothetical system, then, a transition from a-helix to /3-sheet should 
occur at Tc, when the temperature is lowered. 
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Figure 7: Normal-frequency distributions of /3-sheets, for the same number of 
cranks per strand, but different number of strands. 
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Temperature (K) 

Figure 8: Helmholtz free energy of a- helix and /3-sheet, treated as harmonic 
oscillators with the corresponding normal frequencies. The intersection of the 
two curves indicates a phase transition. The equilibrium phase is the one with 
lower free energy. 
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